مقایسه روش های طیفی برای شناسایی زبان گفتاری

Authors

رضا, شقایق پژوهشکده پردازش داده، پژوهشگاه توسعه فناوری‌های پیشرفته خواجه‌نصیرالدین طوسی

کبودیان, جهانشاه دانشگاه رازی کرمانشاه

Abstract:

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The Gaussian mixture model is the most common statistical model in spectral-based language identification systems. On the other hand, in phonetic-based methods, speech signals are divided into a sequence of tokens using the hidden Markov model (HMM) and a language model is trained using the obtained sequence. Approaches like PRLM, PPRLM, and PR-SVM are some examples of phonetic-based methods. In research papers, usually a combination of phonetic-based and spectral-based systems are used to achieve a high quality language identification system. Spectral-based methods have been the focus of researchers, since they have no need for labeled data and usually achieve better results than phonetic approaches. Therefore, in this paper, these methods used for language identification and different spectral methods, are introduced, implemented, and compared with spoken language recognition. The basic spectral language identification method is Gaussian Mixture Model-Universal Background Model (GMM-UBM). In this paper, the MMI discrimination method is used to improve the Gaussian model of each language. Moreover, in order to model the language dynamically, GMM is replaced with the ergodic hidden Markov model (EHMM). GSV-SVM and GMM tokenizer methods are also implemented as two popular spectral approaches. In this paper, novel speaker and channel variation modeling methods are used as language identification approaches, including joint factor analysis (JFA), identity vector (i-Vector) and several variations compensation methods exploited to improve the results of i-Vector. Furthermore, in order to boost the performance of language recognition systems, different post-processing methods are applied. For post-processing, each element of raw score vector indicates the degree by which the spoken signal belongs to a language. Post-processing methods are applied to this vector as a classifier and allows making better language detection decisions by mapping the raw score vector to a space of desired languages. Different studies have employed different post-processing methods, including GMM, NN, SVM, and LLR. This study exploits several score post-processing methods to improve the quality of language recognition. The goal of the experiments in this article is to detect and distinguish Farsi, English, and Arabic, individually and simultaneously from other languages. The latter is also called open-set language identification. The signals considered in this paper include two-sided conversations, whose quality is usually not desirable due to strong noise signals, background noises of individuals or music, accents, etc. Gaussian mixture-universal model (GMM-UBM) was implemented as the basic method. In this approach, mean EER of the three target languages (Farsi, English, and Arabic) was 13.58. Experimental results indicated that training the GMM language identification system with the MMI discrimination training algorithm is more efficient than systems only trained by the ML algorithm. More specifically, the mean EER of the three target languages was reduced about 8 percent in comparison to GMM-UBM. The GMM tokenizer method was also tested as a novel spectral approach. Using this method, the mean EER of the three target languages was also about 5 percent better than GMM-UBM. In this study, the GSV-SVM discrimination method was also used for language recognition. The results of this method were considerably better than those of common spectral approaches, such that the mean EER of the three target languages was reduced by 11 percent in comparison to GMM-UBM. This study improves the low speed of this method using a model pushing method. This study also implemented two novel methods, JFA and i-Vector. According to the results, both of these methods provide better results than GMM-UBM, such that the mean EER values of the three target languages in JFA and i-Vector are respectively reduced by 1% and 12%. Generally, experimental results showed that i-Vector provides better results than other spectral language identification systems. This study is a result of a seven-year research in spoken language identification in the advanced technology development center of Khajeh Nasiredin Tousi. The ongoing research includes studying and implementing novel spectral language identification algorithms like PLDA and state-of-the-art phonetic language identification methods to combine the two spectral and phonetic systems and eventually, achieving a high quality language identification system.

Upgrade to premium to download articles

Already have an account?login

similar resources

بررسی اشتباهات گفتاری در رسانه های جمعی روسی زبان ایرانی

رابطة تعاملی بین رسانه های جمعی و زندگی اجتماعی‘ اعضای جامعه را مدام با مسائل فرهنگی روبرو می سازد. زبان یکی از لایه های مهم محیط پیرامونی انسان است و از این رابطه دو سویه با رسانه های جمعی ‘ تأثیر می پذیرد . در حال حاضر زبان ادبی روسی با تحولات عمیقی روبرو می باشد. ما شاهد استفاده گسترده از کلمات عامیانه‘ عبارت های نا آشنا و کلمات خارجی به خصوص انگلیسی در زبان معاصر روسی هستیم. تمام بخش های زب...

full text

کاربرد نشانگرهای طیفی لحظه‌ای برای شناسایی کانال‌های نفت‌گیر

نشانگرهای لرزه‌ای ابزار مفیدی در تفسیر پدیده‌های چینه‌شناسی هستند. استفاده از نشانگرهای لرزه‌ای این امکان را فراهم می‌آورد که پدیده‌های زمین‌شناسی که به شکل معمول در مقطع لرزه‌ای قابل مشاهده نیستند را مشاهده کنیم. یکی از این پدیده‌ها کانال‌های مدفون رودخانه‌ای می‌باشد. کانال‌های پر شده توسط سنگ‌های متخلخل که به وسیله یک خمیره ناتراوا محصور شده‌اند، در اکتشافات چینه ای از اهمیت ویژه‌ای برخوردارن...

full text

مقایسه روش های فراابتکاری برای

Abstract With the introduction of mean-variance model Markowitz took a giant step in modeling and optimizing portfolio type problems. But his model is based upon some assumptions that rarely they can hold in practice. For this reason, many researchers have taken steps both theoretical and practical to make some improvements to his standard mean-variance model. Up to now different risk criteria...

full text

کاربرد روش قطبش القایی طیفی (SIP) برای اکتشاف منابع هیدروکربوری

روش قطبش القایی طیفی، شاخه‌ای از روش‌های اکتشافی در ژئوفیزیک است که به‌طور گسترده‌ای در پی‌جویی‌های محیطی و ژئوفیزیکی، علاوه بر پی‌جویی‌های اکتشافی مواد معدنی در خصوص، نفت و گاز و زغال سنگ نیز به‌کار رفته است. روش قطبش القایی قادر است اندازه‌گیری‌هایی از مقاومت ویژه مختلط ظاهری چند بسامدی در محدوده بسامد‌های 2-10 تا 102 هرتز را فراهم کند. این اندازه‌گیری‌ها برای تعیین پارامترهای طیفی و توزیع‌ها...

full text

بررسی ترمیم‌های گفتاری در زبان فارسی

ترمیم، یکی از پدیده‌های رایج در مکالمات روزمره همه زبان‌ها و از جمله زبان فارسی است. ترمیم جایگزینی است برای گفتار تولید شده قبلی که گوینده یا فرد دیگری در گفتگو بیان میکند. مطالعه ترمیم، بخشی از حوزه مکالمه کاوی است که گفتگو‌های روزمره را در قالب دادههای صوتی و تصویری مطالعه میکند. پژوهش حاضر به دنبال مشخص کردن انواع ترمیم‌ها و جایگاه آنها در توالی نوبتها در گفتگوهای تلویزیونی زبان فارسی...

full text

نقدی بر نظریه «گفتاری بودن زبان قرآن»

full text

My Resources

Save resource for easier access later

Save to my library Already added to my library

{@ msg_add @}

Journal title

پردازش علائم و داده ها

volume 14 issue 1

pages 111- 134

publication date 2017-06

unfollow

{@ msg @}

By following a journal you will be notified via email when a new issue of this journal is published.

Keywords

No Keywords

Hosted on Doprax cloud platform doprax.com